14 research outputs found
Consistent Training via Energy-Based GFlowNets for Modeling Discrete Joint Distributions
Generative Flow Networks (GFlowNets) have demonstrated significant
performance improvements for generating diverse discrete objects given a
reward function , indicating the utility of the object and trained
independently from the GFlowNet by supervised learning to predict a desirable
property given . We hypothesize that this can lead to incompatibility
between the inductive optimization biases in training and in training the
GFlowNet, potentially leading to worse samples and slow adaptation to changes
in the distribution. In this work, we build upon recent work on jointly
learning energy-based models with GFlowNets and extend it to learn the joint
over multiple variables, which we call Joint Energy-Based GFlowNets (JEBGFNs),
such as peptide sequences and their antimicrobial activity. Joint learning of
the energy-based model, used as a reward for the GFlowNet, can resolve the
issues of incompatibility since both the reward function and the GFlowNet
sampler are trained jointly. We find that this joint training or joint
energy-based formulation leads to significant improvements in generating
anti-microbial peptides. As the training sequences arose out of evolutionary or
artificial selection for high antibiotic activity, there is presumably some
structure in the distribution of sequences that reveals information about the
antibiotic activity. This results in an advantage to modeling their joint
generatively vs. pure discriminative modeling. We also evaluate JEBGFN in an
active learning setting for discovering anti-microbial peptides.Comment: 9 Pages, 10 Figure
Multi-Fidelity Active Learning with GFlowNets
In the last decades, the capacity to generate large amounts of data in
science and engineering applications has been growing steadily. Meanwhile, the
progress in machine learning has turned it into a suitable tool to process and
utilise the available data. Nonetheless, many relevant scientific and
engineering problems present challenges where current machine learning methods
cannot yet efficiently leverage the available data and resources. For example,
in scientific discovery, we are often faced with the problem of exploring very
large, high-dimensional spaces, where querying a high fidelity, black-box
objective function is very expensive. Progress in machine learning methods that
can efficiently tackle such problems would help accelerate currently crucial
areas such as drug and materials discovery. In this paper, we propose the use
of GFlowNets for multi-fidelity active learning, where multiple approximations
of the black-box function are available at lower fidelity and cost. GFlowNets
are recently proposed methods for amortised probabilistic inference that have
proven efficient for exploring large, high-dimensional spaces and can hence be
practical in the multi-fidelity setting too. Here, we describe our algorithm
for multi-fidelity active learning with GFlowNets and evaluate its performance
in both well-studied synthetic tasks and practically relevant applications of
molecular discovery. Our results show that multi-fidelity active learning with
GFlowNets can efficiently leverage the availability of multiple oracles with
different costs and fidelities to accelerate scientific discovery and
engineering design.Comment: Code: https://github.com/nikita-0209/mf-al-gf
GFlowNet-EM for learning compositional latent variable models
Latent variable models (LVMs) with discrete compositional latents are an
important but challenging setting due to a combinatorially large number of
possible configurations of the latents. A key tradeoff in modeling the
posteriors over latents is between expressivity and tractable optimization. For
algorithms based on expectation-maximization (EM), the E-step is often
intractable without restrictive approximations to the posterior. We propose the
use of GFlowNets, algorithms for sampling from an unnormalized density by
learning a stochastic policy for sequential construction of samples, for this
intractable E-step. By training GFlowNets to sample from the posterior over
latents, we take advantage of their strengths as amortized variational
inference algorithms for complex distributions over discrete structures. Our
approach, GFlowNet-EM, enables the training of expressive LVMs with discrete
compositional latents, as shown by experiments on non-context-free grammar
induction and on images using discrete variational autoencoders (VAEs) without
conditional independence enforced in the encoder.Comment: ICML 2023; code: https://github.com/GFNOrg/GFlowNet-E
Amortizing intractable inference in large language models
Autoregressive large language models (LLMs) compress knowledge from their
training data through next-token conditional distributions. This limits
tractable querying of this knowledge to start-to-end autoregressive sampling.
However, many tasks of interest -- including sequence continuation, infilling,
and other forms of constrained generation -- involve sampling from intractable
posterior distributions. We address this limitation by using amortized Bayesian
inference to sample from these intractable posteriors. Such amortization is
algorithmically achieved by fine-tuning LLMs via diversity-seeking
reinforcement learning algorithms: generative flow networks (GFlowNets). We
empirically demonstrate that this distribution-matching paradigm of LLM
fine-tuning can serve as an effective alternative to maximum-likelihood
training and reward-maximizing policy optimization. As an important
application, we interpret chain-of-thought reasoning as a latent variable
modeling problem and demonstrate that our approach enables data-efficient
adaptation of LLMs to tasks that require multi-step rationalization and tool
use.Comment: 23 pages; code: https://github.com/GFNOrg/gfn-lm-tunin
BatchGFN: Generative Flow Networks for Batch Active Learning
We introduce BatchGFN -- a novel approach for pool-based active learning that
uses generative flow networks to sample sets of data points proportional to a
batch reward. With an appropriate reward function to quantify the utility of
acquiring a batch, such as the joint mutual information between the batch and
the model parameters, BatchGFN is able to construct highly informative batches
for active learning in a principled way. We show our approach enables sampling
near-optimal utility batches at inference time with a single forward pass per
point in the batch in toy regression problems. This alleviates the
computational complexity of batch-aware algorithms and removes the need for
greedy approximations to find maximizers for the batch reward. We also present
early results for amortizing training across acquisition steps, which will
enable scaling to real-world tasks.Comment: Accepted at the Structured Probabilistic Inference & Generative
Modeling workshop, ICML 202
Multi-Objective GFlowNets
We study the problem of generating diverse candidates in the context of
Multi-Objective Optimization. In many applications of machine learning such as
drug discovery and material design, the goal is to generate candidates which
simultaneously optimize a set of potentially conflicting objectives. Moreover,
these objectives are often imperfect evaluations of some underlying property of
interest, making it important to generate diverse candidates to have multiple
options for expensive downstream evaluations. We propose Multi-Objective
GFlowNets (MOGFNs), a novel method for generating diverse Pareto optimal
solutions, based on GFlowNets. We introduce two variants of MOGFNs: MOGFN-PC,
which models a family of independent sub-problems defined by a scalarization
function, with reward-conditional GFlowNets, and MOGFN-AL, which solves a
sequence of sub-problems defined by an acquisition function in an active
learning loop. Our experiments on wide variety of synthetic and benchmark tasks
demonstrate advantages of the proposed methods in terms of the Pareto
performance and importantly, improved candidate diversity, which is the main
contribution of this work.Comment: 23 pages, 8 figures. ICML 2023. Code at:
https://github.com/GFNOrg/multi-objective-gf
PhyloGFN: Phylogenetic inference with generative flow networks
Phylogenetics is a branch of computational biology that studies the
evolutionary relationships among biological entities. Its long history and
numerous applications notwithstanding, inference of phylogenetic trees from
sequence data remains challenging: the high complexity of tree space poses a
significant obstacle for the current combinatorial and probabilistic
techniques. In this paper, we adopt the framework of generative flow networks
(GFlowNets) to tackle two core problems in phylogenetics: parsimony-based and
Bayesian phylogenetic inference. Because GFlowNets are well-suited for sampling
complex combinatorial structures, they are a natural choice for exploring and
sampling from the multimodal posterior distribution over tree topologies and
evolutionary distances. We demonstrate that our amortized posterior sampler,
PhyloGFN, produces diverse and high-quality evolutionary hypotheses on real
benchmark datasets. PhyloGFN is competitive with prior works in marginal
likelihood estimation and achieves a closer fit to the target distribution than
state-of-the-art variational inference methods
Learning GFlowNets from partial episodes for improved convergence and stability
Generative flow networks (GFlowNets) are a family of algorithms for training
a sequential sampler of discrete objects under an unnormalized target density
and have been successfully used for various probabilistic modeling tasks.
Existing training objectives for GFlowNets are either local to states or
transitions, or propagate a reward signal over an entire sampling trajectory.
We argue that these alternatives represent opposite ends of a gradient
bias-variance tradeoff and propose a way to exploit this tradeoff to mitigate
its harmful effects. Inspired by the TD() algorithm in reinforcement
learning, we introduce subtrajectory balance or SubTB(), a GFlowNet
training objective that can learn from partial action subsequences of varying
lengths. We show that SubTB() accelerates sampler convergence in
previously studied and new environments and enables training GFlowNets in
environments with longer action sequences and sparser reward landscapes than
what was possible before. We also perform a comparative analysis of stochastic
gradient dynamics, shedding light on the bias-variance tradeoff in GFlowNet
training and the advantages of subtrajectory balance.Comment: ICML 202
GFlowOut: Dropout with Generative Flow Networks
Bayesian Inference offers principled tools to tackle many critical problems
with modern neural networks such as poor calibration and generalization, and
data inefficiency. However, scaling Bayesian inference to large architectures
is challenging and requires restrictive approximations. Monte Carlo Dropout has
been widely used as a relatively cheap way for approximate Inference and to
estimate uncertainty with deep neural networks. Traditionally, the dropout mask
is sampled independently from a fixed distribution. Recent works show that the
dropout mask can be viewed as a latent variable, which can be inferred with
variational inference. These methods face two important challenges: (a) the
posterior distribution over masks can be highly multi-modal which can be
difficult to approximate with standard variational inference and (b) it is not
trivial to fully utilize sample-dependent information and correlation among
dropout masks to improve posterior estimation. In this work, we propose
GFlowOut to address these issues. GFlowOut leverages the recently proposed
probabilistic framework of Generative Flow Networks (GFlowNets) to learn the
posterior distribution over dropout masks. We empirically demonstrate that
GFlowOut results in predictive distributions that generalize better to
out-of-distribution data, and provide uncertainty estimates which lead to
better performance in downstream tasks